计算机与现代化 ›› 2012, Vol. 1 ›› Issue (11): 35-38.doi: 10.3969/j.issn.1006-2475.2012.11.010

• 人工智能 • 上一篇    下一篇

基于搜索结果的聚类算法

罗钊航,李旭伟   

  1. 四川大学计算机学院,四川 成都 610065
  • 收稿日期:2012-07-13 修回日期:1900-01-01 出版日期:2012-11-10 发布日期:2012-11-10

Optimization of Search Results Based on Clustering Algorithm

LUO Zhao-hang, LI Xu-wei   

  1. College of Computer Science, Sichuan University, Chengdu 610065, China
  • Received:2012-07-13 Revised:1900-01-01 Online:2012-11-10 Published:2012-11-10

摘要: 当前的搜索引擎中,存在大量的冗余搜索结果,且不能对搜索结果进行指导分类。本文提出一种基于密度的聚类算法,能够有效地对搜索结果进行聚类优化和分类。该算法选取搜索结果中权重高于一定值的网页,提取网页的特征值与候选关键字,标注特征范围,再进行网页相似度比较,最大限度地消除冗余网页,并根据网页的候选关键字提供分类,从而提高搜索结果的精准性和满意度,达到更智能的效果。

关键词: 基于密度的聚类算法, 网页相似度, 聚类, 冗余网页

Abstract: Nowadays there are many redundancy pages in results of search engine, and the results are not classified. An optimization algorithm of webpage search results based on an improved DBSCAN (density-based spatial clustering of applications with noise) algorithm is proposed and effective to cluster and classify the results. The algorithm selects the webpages with search weights above a certain value from all search results, then it extracts the eigenvalue of pages and candidate keys, compares the pages similarity to maximize the elimination of duplication and redundancy pages. Meanwhile, classifications are provided in accordance with the candidate keys of pages, thereby the precision and satisfaction of search engine could be improved with the effect of more intelligence.

Key words: DBSCAN algorithm, page similarity, clustering, redundancy page

中图分类号: